feat(renderer): expose preserve_*_thinking flags in RendererConfig by hallerite · Pull Request #2436 · PrimeIntellect-ai/prime-rl

hallerite · 2026-05-07T18:01:55Z

Summary

Re-applies #2433 onto main. The original PR landed against feat/unify-inference-generate and was merged there by mistake.

Adds two new fields to RendererConfig:

preserve_all_thinking: bool = False
preserve_thinking_between_tool_calls: bool = False

Both RL and SFT forward the flags to keep tokenization byte-consistent:

Orchestrator create_renderer() — bound at construction on the training-side renderer used by build_trajectory_step / render_ids for trajectory tokenization.
Orchestrator setup_inference_pool() → setup_clients() → vf.ClientConfig — the verifiers RendererClient picks them up and forwards to its create_renderer_pool, so every inference render carries the same thinking-preservation behaviour.
SFT create_renderer() — same construction-time binding on the training renderer, so SFT tokenization matches the configured preservation policy.

Both flags are off by default → zero behaviour change for existing configs. Both the orchestrator and SFT validate_renderer_args validators reject either flag when use_renderer=False, matching the existing renderer knobs.

Per-flag semantics

preserve_all_thinking — emit reasoning_content for every past-assistant turn, even those before another user message.
preserve_thinking_between_tool_calls — emit reasoning_content only for the current tool cycle (post-last-user assistant→tool→…→assistant block). Older blocks fall back to template default. Strict subset of preserve_all_thinking.

Example config

[orchestrator.renderer]
name = "glm5"
preserve_all_thinking = true

Dependencies

No pyproject.toml / uv.lock changes. After the monorepo split in #2507, verifiers and renderers are uv workspace members vendored under deps/. Main's current submodule pins already include the upstream changes this PR depends on:

deps/verifiers @ 711a7c7 — 76 commits ahead of the verifiers PR (Run dir? #1298) that added preserve_*_thinking to ClientConfig.
deps/renderers @ 87084dc — 36 commits ahead of the renderers PR (Data streaming #4) that added construction-time preserve_*_thinking to create_renderer.

File-path delta vs #2433

The original PR targeted feat/unify-inference-generate where configs live under src/prime_rl/configs/. On main they live under packages/prime-rl-configs/src/prime_rl/configs/ (workspace-package split landed in #2416). The diff is otherwise identical.

🤖 Generated with Claude Code

Note

Medium Risk
Adds new renderer configuration knobs and threads them through orchestrator/inference client construction; while default-off, it touches rollout/inference plumbing and bumps verifiers plus adds a new renderers dependency, which could affect runtime compatibility.

Overview
Exposes two new RendererConfig booleans (preserve_all_thinking and preserve_thinking_between_tool_calls) and extends orchestrator validation to reject them unless orchestrator.use_renderer=true.

When the renderer client is enabled, these flags are forwarded both to create_renderer() and through setup_inference_pool()/setup_clients() into vf.ClientConfig (guarded so non-renderer clients don’t receive unknown extras), keeping training-side tokenization and inference rendering consistent.

Updates dependencies by adding a direct renderers git dependency and bumping the pinned verifiers revision; unit tests are adjusted to assert the new parameters are passed through.

^{Reviewed by Cursor Bugbot for commit cc381eb. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds two new fields to RendererConfig: - preserve_all_thinking - preserve_thinking_between_tool_calls The orchestrator forwards them in two places so train and infer stay consistent: 1. create_renderer() — bound at construction on the training-side renderer used by build_trajectory_step / render_ids. 2. setup_inference_pool() → setup_clients() → vf.ClientConfig — the verifiers RendererClient picks them up and forwards to its create_renderer_pool, so every inference render carries the same thinking-preservation behaviour. Both flags are off by default → zero behaviour change for existing configs. Setting either without orchestrator.use_renderer=True is rejected by validate_renderer_args, matching the existing renderer knobs. Bumps source pins: - verifiers 3b77145 → a7516a1 (PR #1298: ClientConfig propagation) - renderers (now a separate repo): pinned to fe67f9f (PR #4: construction-time preserve flags) Re-applies #2433, which was merged into feat/unify-inference-generate by mistake. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2544d0c. Configure here.}

The flags were added to the shared RendererConfig but only wired up in the orchestrator. SFT also constructs a renderer via create_renderer for training-side tokenization, so it must forward both flags or silently ignore them. Also tighten validate_renderer_args to reject either flag when use_renderer=False, matching the orchestrator validator. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

samsja marked this pull request as ready for review May 16, 2026 06:30

hallerite force-pushed the feat/preserve-thinking-flags-main branch from cc381eb to 2544d0c Compare May 16, 2026 09:42

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py

hallerite requested review from mikasenghaas and samsja May 16, 2026 09:58

hallerite mentioned this pull request May 16, 2026

feat: renderer self-describes prefix-stability for SFT / RL / inference consumers PrimeIntellect-ai/renderers#41

Open

samsja approved these changes May 17, 2026

View reviewed changes

samsja merged commit ae610d9 into main May 17, 2026
16 checks passed

hallerite deleted the feat/preserve-thinking-flags-main branch May 17, 2026 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
samsja merged 2 commits into
mainfrom
feat/preserve-thinking-flags-main

hallerite commented May 7, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hallerite commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Per-flag semantics

Example config

Dependencies

File-path delta vs #2433

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hallerite commented May 7, 2026 •

edited

Loading